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Abstract 

Recently several minimum free energy (MFE) folding algorithms for predicting the joint structure 
of two interacting RNA molecules have been proposed. Their folding targets are interaction 
structures, that can be represented as diagrams with two backbones drawn horizontally on top 
of each other such that (1) intramolecular and intermolecular bonds are noncrossing and (2) 
there is no "zig-zag" configuration. This paper studies joint structures with arc-length at least 
four in which both, interior and exterior stack-lengths are at least two (no isolated arcs). The 
key idea in this paper is to consider a new type of shape, based on which joint structures can be 
derived via symbolic enumeration. Our results imply simple asymptotic formulas for the number 
of joint structures with surprisingly small exponential growth rates. They are of interest in the 
context of designing prediction algorithms for RNA-RNA interactions. 
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1. Introduction 

RNA-RNA binding is an important phenomenon observed in various classes of non- 
coding RNAs and plays a crucial role in a number of regulation processes. Regulatory 
antisense RNAs control gene expression by prohibiting the translation of a target mRNA 
through establishing stable base pairing interactions. Examples include the regulation 
of translation in both: prokaryotes [1] and eukaryotes [21 |3], the targeting of chemical 
modifications [1], insertion editing [S], and transcriptional control [B]. More and more 
evidence suggests, that RNA-RNA interactions also play a role for the functionality of long 
mRNA — like ncRNAs [7]. A common theme in many RNA classes, including miRNAs, 
snRNAs, gRNAs, snoRNAs, and in particular many of the procaryotic small RNAs, is 
the formation of RNA-RNA interaction structures that are much more complex than 
simple complementary sense-antisense interactions. The interaction between two RNAs 
is governed by the same physical principles that determine RNA folding: the formation 
of specific base pairs patterns whose energy is largely determined by base pair stacking 
and loop strains. As a result, secondary structures are an appropriate level of description 
to quantitatively understand the thermodynamics of RNA-RNA binding. 
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Fig. 1. Natural joint structures. Known interaction bonds of TAR*(GA)- 
TAR [TO] and CopA-CopT [ll] are displayed. 



Alkan et al. [S] proved that the RNA-RNA interaction prediction (RIP) problem is in 
its general form NP-complete. Nevertheless, we are facing increasing demand for effi- 
cient computational methods for RIP. By restricting the space of allowed configurations, 
polynomial-time algorithms on secondary structure level have been derived. Pervouch- 
ine [9] and Alkan et al. [8] proposed MFE folding algorithms for predicting the joint 
structure of two interacting RNA molecules. In this model, "joint structure" means that 
the intramolecular structures of each partner are pseudoknot-free, that the intermolecular 
binding pairs are noncrossing, and that there is no so-called "zig-zag" configuration, see 
Fig. [TJ The optimal joint structure can be computed in 0{N'^) time and 0{N'^) space by 
means of dynamic programming [SI IH |T21 [13] • Extensions involving the partition function 
were proposed by Chitsaz et al. (piRNA) [13] and Huang et al. (rip) [T3]. In contrast 
to the situation for RNA secondary structures [T5l [16], much less is known about joint 
structures. Only joint structures of arc-length greater than or equal to two have been 
studied in [I?]. However, the biochemistry of nucleotide-pairings, favors parallel stacking 
of bonds due to entropy and the minimum length of intramolecular bonds of four. Un- 
fortunately, the biophysically relevant class of canonical joint structures with arc-length 
> 4, is not governed by the framework in [T7] . 

In this paper, we introduce the general framework dealing with a-canonical joint struc- 
tures having arc-length > cr + 2. In particular, our results apply to the class of canonical 
joint structures having arc-length > 4. Our results are relevant for the design and analysis 
of RIP folding algorithms and show that the numbers of a-canonical joint structures with 
arc-length > a + 2 exhibit surprisingly small exponential growth rates. 

This paper is organized as follows: in Section [2] we introduce joint structures along 
the lines of [12] and in Section [3] we compute, along the lines of [IB], the generating 
function of refined shapes via symbolic enumeration. In Section H] we show how to infiate 
refined shapes into joint structures and derive the generating function of joint structures. 
Section [S] presents the singularity analysis and asymptotic formulas. We finally integrate 
our results in Section [6] 

2. Secondary structures and joint structures 

Let us begin by discussing some basic results of [151 [161 [19] . L^t f{n) denote the number 
of all noncrossing matchings of n arcs having the generating function 'P{z) = ^ f{n) z^. 



Recursions for /(n) allow us to derive zF{zy — F{z) + 1 = 0, that is we have 
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Let 7^'^' denote the combinatorial class of cr-canonical secondary structures having arc- 
length > A and T^^' (n) denote the number of all a-canonical secondary structures over n 
vertices having arc-length > A and 



Theorem 1. 119] Let a & N , z be an indeterminant and let 
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Ufj^Z^ — — r- - ■ -, 

^Za _ _^ I 

A 

vx{z) = 1- z + u^{z)^z^, 

h=2 

then, tI^\z), the generating function of a -canonical structures with minimum arc-length 
A is given by 

^ ' 2z 

where Cl^^ is the dominant singularity of t|^' [z] and the minimal positive real solution of 
the equation 

^u„{z) z \ _ 1 
vx{z) J ~ 4- 

Theorem [1] implies that for any specified A and a, Ti^l(2) is algebraic over the rational 
function field C{z), since F{z) is algebraic and vx{z),u„{z) are both rational functions. 

Given two RNA sequences R = {Ri)i and S = (Sj)^ with n and m vertices, we 
index the vertices such that Ri is the 5' end of R and 5*1 is the 3' end of S. The 
intramolecular base pair can be represented by an arc (interior), with its two endpoints 
contained in either R or S. Similarly, the extramolecular base pair can be represented 
by an arc (exterior) with one of its endpoints contained in R and the other in S. When 
representing arc-configurations, we draw all -R-arcs in the upper-halfplane and all S-arcs 
in the lower-halfplane, see Fig. [21 (A). 

We refer to the subgraph induced by {Ri,...,Rj} by R[i,j]- The subgraph R[i,j] 
{S[i',j']) is called secondary segment if there is no exterior arc RkSk' such that i < k < j 
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Fig. 2. (A): A joint structure J{R,S,I) witli arc-lengtli > 4 and 
stack-Iengtli > 2. Secondary segments (red): tlie subgraplis i?[15,21] 
and S'[12, 19]. Ancestors and descendants: for tlie exterior arc 
we have the following sets of /2-ancestors and S'- ancestors of R^S^: 
{R1R14, R2R13, R3R9, RaRs} and {S1S21, S2S20, S^Sg, S^Sq}. The exterior 
arc R5S5 is a common descendant of RiRu and S^Sg, while -Rio 5*10 is not. 
Subsumed arcs: RiRu subsumes S^Sg and 5*1 5*21. (B): A zigzag, generated 
by -R2'S'i, R3S3 and RqS4^. 

[i' < k' < j'), see Fig. [2l (A). An interior arc RiRj is an R- ancestor of the exterior arc 
RkSk' if i < k < j. Analogously, Si'Sji is an S'-ancestor of RkSk' if i' < k' < j'. We 
also refer to RkSk' as a descendant of R^Rj and Si'Sj' in this situation, see Fig. [21 (A). 
Furthermore, we call R^Rj and Si'Sj' dependent if they have a common descendant and 
independent, otherwise. Let RiRj and SiiSji be two dependent interior arcs. Then R^Rj 
subsumes Si'Sj', or Si'Sj' is subsumed in RiRj, if for any RkSk' E I, i' < k' < j' implies 
i < k < j, that is, the set of descendants of Si'Sj' is contained in the set of descendants 
of RiRj, see Fig. [21 (A). A zigzag is a subgraph containing two dependent interior arcs 
RiRj and Si'Sj' neither one subsuming the other, see Fig. [21 (B). 

A joint structure P, [H [131 [12] J{R, S, I), see Fig. [21 (A), is a graph such that 

(1) R, S are secondary structures (each nucleotide being paired with at most one other 
nucleotide via hydrogen bonds, without internal pseudoknots) ; 

(2) J is a set of exterior arcs without external pseudoknots, i.e., if RiSj, Ri'Sji G / 
then i < i' implies j < j'; 

(3) J{R, S, I) contains no zig-zags. 

We next specify some notations 

• an interior arc (or simply arc) of length A is an arc RiRj {Si'Sj') where j — i = X 
(/-^' = A), 

• an interior stack (or simply stack) of length a is a maximal sequence of "parallel" 
interior arcs, 

{RiRj, Ri-^-iRj-i, . . . , Ri-^-a-iRj-a+i) or 

{SiSj, Si^iSj-i, . . . , Si^a-~lSj~a+l) : 

• an exterior stack of length r is a maximal sequence of "parallel" exterior arcs, 

{RiSi', Ri-^-lSi'-^-l, . . . , Ri-^-r-lSi'^r-l)- 
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Fig. 3. The four types of tight structures are defined as follows: o ; 

{RiSi>} = J°j.i>j> and i = j, i' = f; V : RiRj G Jij.i/j> and Si>Sf ^ Jlj-i',/^ 
A : SiiSji G Ji^j-i'j' and RiRj ^ Jt,j;i\j''i '-' • {RiRji Si'Sji} G Ji^j-i'j'] 




Fig. 4. Decomposition of joint structures. We display different secondary 
segments (red) and tight structures (o blue, V green, A purple, □ pink) in 
which Ji,2i;i,2i decomposes. 

A a-canonical joint structure is a joint structure with stack-length > a. In Fig. [21 (A), 
we give an example of 2-canonical joint structure with arc-length > 4. 

Let the block Jij-ej' denote the subgraph of a joint structure J{R, S, I) induced by a 
pair of subsequences {Ri, -Rj+i, . . . , Rj} and {Si', Si'+i, . . . , Sj'}. Given a joint structure 
J{R,S,I), a tight structure of J{R,S,I) is the minimal block Jij-i'j' containing all the 
i?-ancestors and S'-ancestors of any exterior arc in Jij^i'j' and all the descendants of 
any interior arc in Jij-i',j'. In the following, a tight structure is denoted by Jf^j-i'^ji- In 
particular, we denote the joint structure J(-R, S, I) by J^{R, S, I) if J{R, S, I) is a tight 
structure of itself. For any joint structure, there are only four types of tight structures 
Jjj-ii ji-i that is {o, V, A, □}, denoted by ^/°.'J'jv'°\ respectively, see Fig. [31 

The key function of tight structures is that they are the building blocks for the decom- 
position of joint structures. 

Proposition 1. [I2] Let J{R, S, I) be a joint structure. Then 

(1) any exterior arc RkSk' in J{R, S, I) is contained in a unique tight structure. 

(2) J{R, S, I) decomposes into a unique collection of tight structures and maximal 
secondary segments. 



In Fig. [Hwe illustrate the decomposition of a joint structure. 
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3. Refined shapes 



A shape [17] is a joint structure containing no secondary segments in which each interior 
stack and each exterior stack have length exactly one. We follow the ideas in [18] and 
obtain the generating function of joint structures via inflation of refined (colored) shapes. 
Refined shapes are obtained by distinguishing two classes of exterior shape-arcs. Each 
distinguished class requires its specific inflation-procedure (see Theorem [2]) . Let us have 
a closer look at two particular classes of exterior arcs: 

• Class Ai: the class of arc-pairs (a, /3) where a is an exterior arc with the unique 
interior 2-arc /3 as its ancestor. 

• Class A2: the class of arc-triples {a, /3, 7) where a is an exterior arc with interior 
2-arcs (3 and 7 as ancestors. 

Let Q denote the combinatorial class of shapes. Given a joint structure, we can obtain 
its shape by first removing all secondary segments and second collapsing any stack into 
a single arc. That is, we have a map ip\ J ^ Q. In Fig. El we illustrate how a joint 
structure is projected into its refined shape. The resulting shape exhibits elements in 
class Ai as well as class A2. 



Fig. 5. Joint structures and their refined shapes: a 2-canonical joint structure 
with arc-length > 4 (left) is projected to its refined shape (right), which exhibits 
elements of class Ai (red) and A2 (blue), respectively. 

Let G(t, /i, ai, 02) denote the number of shapes having total t interior arcs and h exterior 
arcs containing ai elements of class Ai and 02 elements of class A2 and 



We shall proceed by revisiting the notions of tight shapes, double tight shapes, interaction 
segments, closed shapes and right closed shapes [T2] : 

• a tight shape is tight as a structure. Let Q"^ denote the class of tight shapes and 
, ai, 02) denote the number of tight shapes having total t interior arcs and 
h exterior arcs containing ci elements of class Ai and 02 elements of class A2, 





G (x, z, u, V 



) = G^{t, h, ai, 02) x'z''u'''v' 
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Any tight shape, comes as exactly one of the four types {o, V, A, □}. The corre- 
sponding classes and generating functions are defined accordingly, ^i"'^'^'^} and 
Q{o,v,A,n}^^^ z, u, v) respectively, 

• a double tight shape is a shape whose leftmost and rightmost blocks are tight 
structures. Let Q^^ denote the class of double tight shapes and G^^(t,h,ai,a2) 
denote the number of double tight shapes having total t interior arcs and h exterior 
arcs containing ai elements of class Ai and 02 elements of class A2, 

• a closed shape is a tight shape of type {V,A,n}. Let Q'-^ denote the class of 
closed shapes and G^{t, h, ai, 02) denote the number of closed shapes having total 
t interior arcs and h exterior arcs containing ai elements of class Ai and 02 elements 
of class A2, 

G^(x, z, u,v) = Y^ G^{t, h, ai, as) x^z^'u^'v^^ 

• a right closed shape is a shape whose rightmost block is a closed shape rather than 
an exterior arc. Let ^^"^ denote the class of right closed shapes and G^*" (t, h, Oi, 02) 
denote the number of right close shapes having total t interior arcs and h exterior 
arcs containing Oi elements of class Ai and 02 elements of class A2, 

G^^{x,z,u,v) = ^G^^(^,/^,al,a2)a:*zV^^;'^^ 

• in a shape, an interaction segment is an empty structure or a tight structure of 
type o (an exterior arc). We denote the class of interaction segment by X and the 
associated generating function by z, u, v). Obviously, I{x, z,u,v) = 1 + z. 

Lemma 1. The generating function G{x, z,u,v) of refined shapes satisfies 

(3.1) A(x, z)G{x, z, u, vY + B(x, 2, m, f )G(a;, z, u, v) + C(x, z) = 0, 
where 

A{x,z) = x{x + 2){z + 1), 

(3.2) B{x,z,u,v) = - {x{x + 2){z + lf + {x + if - {2xu + x'^ v)z{z + 1)) , 

C{x,z) = (x + 1)2(^ + 1). 

Explicitly, 

-B(x, z, u, v) - y/B{x, z, u, vy - 4 A(x, z)C{x, z) 

(3.3) G(x, .) = . 



Proof. Proposition [T] implies that any shape can be decomposed into a unique collection 
of tight shapes. Furthermore, each shape can be decomposed into a unique collection of 
closed shapes and exterior arcs. We decompose a shape in four steps, see Fig. El We 
translate each decomposition step into the construction of combinatorial classes. 
Step (1): decomposition into a right closed shape and its rightmost interaction segment, 
i.e. 



8 



Step(1) 



step (2) 



Step (3) 



Step (4) 



D 





- 1 1 




a - 






1 — 1 — 






M - 






1 1 — 




D td 



1 1 1 1— I 



Fig. 6. The shape-grammar. The notations of structural components 
are explained in the panel below. A: interaction segment; B: arbitrary 
shape G{R,S,I); C: right closed shape G^'^{R, S, I); D: double tight 
shape G^'^{R, S, I); E: closed shape G'-' {R, S, I); F: type □ tight shape 
G^{R,S,I)] G: type V tight shape G'^{R,S,I); H: type A tight shape 
I: type o tight shape G°{R, 3,1); 



and Proposition [2] implies 

(3.4) G(x, z, u, v) = G^'*"(x, z, u, v) ■ l{x, z, u, v) + z, u, v). 
Step (2): splitting of the rightmost closed shape in a right closed shape 

whence 

(3.5) G^'^{x, z, u, v) = G(x, z, u, v) ■ G'^{x, z, u, v). 

Step (3): type-depended decomposition of a closed shape. 
gc = g^ + g^ + go 

g"" = iZ,Z,Z,£) + iZ,£,£,£)xg''^ 
g"" = {Z,Z,Z,£) + {Z,£,£,£)xg''^ 
g^ = {Z X Z,Z,£,Z) + {Z X Z,£,£)xg^^. 

We therefore have 

G*" (a;, z, u, v) = G^(x, z, u, v) + G^(x, z, u, v) + G°(x, z, u, v) 

G^(x, z,u,v) = X zu + X G^'^{x, z, u, v) 

G^(x, z,u,v) = X zu + X G^'^{x, z, u, v) 

G°(x, z,u,v) = x^ zv + G^'^{x, z, u, v). 



(3.6) 



Step (4): obtaining double tight shapes of Step (3) by excluding the class of interaction 
segment and the class of closed shapes, i.e. 

gDT _j_gC^ 

whence 

(3.7) G^'^{x, z, M, v) = G(x, z, M, v) — I(x, z, u, v) — G*" (x, z, u, v). 

Solving equations fl3.4l) - fl3.7p . leads to (1) the functional equation (13. ip and (2) eq. (13.20 . 
This quadratic equation together with the initial condition G(0, 0,0,0) = 1, implies 
eq. (O. □ 



4. The generating function of joint structures 

Let "Ho- denote the class of a-canonical joint structures with arc-length > cr + 2. Let 
Hcr{s) denote the number of joint structures in Tia with total s vertices having the gen- 
erating function 

We are now in position to compute the generating function li„[x). Our strategy is 
inflating the shapes with specific inflation on specific exterior arcs. 

Theorem 2. For any a > 1, Ho-(x) is a power series and 

(4.1) H,{x) = T^:+^\xfG{r],r],r],,r]2), 

where 

x'"^T^:+^\x)'' 

^ ~ l-x2-x2-(Ti"+'l(a;)2-l)' 

-1 + x^- x'" + (1 + x'")Ti"+'J(x)' 
" ' 

1 - x2 + + (-2 + 2x2 - 3x2-)T!r+'l (x)2 + (1 + 2x2-)Tlr+'l (x)^ 

" W%r ■ 

Proof. Let Q{t,h,ai,a2) denote the class of shapes having total t interior arcs and h 
exterior arcs containing Oi elements of class Ai and a2 elements of class A2. For any joint 
structure in Ha, we obtain a shape in Q as follows: 

(1) remove all secondary segments, 

(2) collapse each interior stack into one interior arc and each exterior stack into one 
exterior arc. 

Then we have the surjective map 
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Fig. 7. Step I: a shape (left) is inflated to a joint structure with arc-length 
> 2 and interior stack- length > 2. Each interior arc in the shape is flrst 
inflated to a stack of size at least two (middle) and second inflated by 
adding one induced stack of size two (right). Note that there are three 
ways to insert the secondary segments to separate the induced stacks (red). 



Indeed, for any shape 7 in Q, we can construct a-canonical joint structures with arc-length 
> cr + 2. (/?: Ho- — )■ G, induces the partition Ti^ = U^fip~^{'y), whence 

(4.2) H.(x) = ^H^(x), 

766 

where H^(x) denotes the generating function of joint structures having shape 7. We 
proceed by computing the generating function H^(x). We will construct H^(x) via simpler 
combinatorial classes as building blocks considering stems, stacks, induced stacks, interior 
arcs, exterior arcs and secondary segments. We inflate a shape 7 in Q{t,h,ai,a2) to a 
joint structure in Ti^ in four steps. 

Step I: we inflate any interior arc in 7 to a stack of size at least a and subsequently 
add additional stacks. The latter are called induced stacks and have to be separated by 
means of inserting secondary segments, see Fig. [71 Note that during this flrst inflation 
step no secondary segments, other than those necessary for separating the nested stacks 
are inserted. We generate 

• secondary segments 7^''^'*'^' having arc-length > a + 2 and stack-length > a with 
generating function T^a~^'^\x), 

• interior arcs TZ with generating function R(x) = x^. 
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stacks, i.e. pairs consisting of the minimal sequence of arcs TZ'^ and an arbitrary 
extension consisting of arcs of arbitrary finite length 

/c, = 7^" X seq (7^) , 

having the generating function 

1 



2cr 



1 — ' 

induced stacks, i.e. stacks together with at least one secondary segment on either 
or both of its sides, 

AC = ^.x ((rj'^+2])2-i), 

having the generating function 



N.(x) = -^(T[-+^1(^)^-1) 
1 — a;^ 



stems, that is pairs consisting of stacks /Cg- and an arbitrarily long sequence of 
induced stacks 

M„ = lC,x Seq (TV,) , 
having the generating function 



MJx) = - ^'^^^^ ^ 



Note that we inflate both, top and bottom sequences. The corresponding generating 
function is Mo-(x)*. 

Step II: we inflate any exterior arc in 7, but not as an element of classes Ai or A2, to 
an exterior stack of size at least a and subsequently add additional exterior stacks. The 
latter are called induced exterior stacks and have to be separated by means of inserting 
secondary segments, see Fig. [HI Note that during this exterior-arc inflation no secondary 
segments, other than those necessary for separating the stacks are inserted. We generate 

• exterior arc TZq having the generating function Rq = x^, 

• exterior stacks, i.e. pairs consisting of the minimal sequence of exterior arcs TZq 
and an arbitrary extension consisting of exterior arcs of arbitrary finite length 

/C: = X Seq (7^o) , 

having the generating function 

1 



KVx) = X 



2cr 



1 — ' 

induced exterior stacks, i.e. stacks together with at least one secondary segment 
on either or both its sides, 

AC = K X ((Tj'^+'l)' - 1) , 
having the generating function 

X Jb 
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Fig. 8. Step II: a joint structure (top-left) obtained in (1) in Fig. [7] is in- 
flated to a new joint structure. Each exterior arc, not contained in classes 
Ai or A2, is first infiated to an exterior stack of size at least two (bot- 
tom) and second infiated by adding one exterior induced stack of size two 
(top- right). Note that just one of the three possible ways of inserting the 
secondary segments in order to separate the induced exterior stacks (red) 
is displayed. 

• exterior stems, that is pairs consisting of exterior stacks /C* and an arbitrarily long 
sequence of induced exterior stacks 



Infiating all exterior arcs that are not contained in classes Ai or A2, we obtain (M* (x))'* ""^ 

Step III: We infiate exterior arcs contained in classes Ai and A2 by inserting additional 
secondary segments at positions between the exterior arc and interior 2-arc, see Fig. [9l 
In contrast to Step II, specific "unwanted" scenarios are excluded. We generate 

• Class Ai: Excluding the case where the exterior arc is infiated to an exterior 
stack of length a and no additional secondary segment is inserted at the position 
between the exterior arc and interior 2-arc, see Fig. [TOl we arrive at 



m: = ic:x seq (AC) , 



having the generating function 




having the generating function 



— X' 



Class A2: There are three scenarios which create an interior arc of arc- length 
< cr + 2, see Fig. [ID 
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Fig. 9. Step III: a joint structure (left) obtained in the bottom in Fig. |8]is 
inflated to a new joint structure with arc-length > 4 (right). Each exterior 
arc in classes Ai or A2 is inflated to an exterior stack of size at least two 
(red) and additional secondary segments (blue) are inserted at the positions 
between the exterior arc and interior 2-arc. 




o-o 



Fig. 10. "Bad" scenario for class Ai: In an element of class Ai (left), 
the exterior arc is inflated to an exterior stack of length 2 and no additional 
secondary segment is inserted at the position between the exterior arc and 
interior 2-arc, leading to an interior arc having arc-length < 4 (right). 



— the exterior arc is inflated to an exterior stack of length a and no additional 
secondary segment is inserted at the positions in both top and bottom se- 
quences, resulting in both interior arcs having arc- length < a + 2, 

— the exterior arc is inflated to an exterior stack of length a and additional 
secondary segment is inserted at the positions only in the bottom sequence, 
resulting in an interior arc in the top having arc- length < cr + 2, 

— the exterior arc is inflated to an exterior stack of length a and additional sec- 
ondary segment is inserted at the positions only in the top sequence, resulting 
in an interior arc in the bottom having arc-length < a + 2. 

Excluding these three scenarios, we obtain 

Ml X (rj"+'])' - 2 7^^ X ((rj"+'i)' -s)- 7^^, 

having the generating function 

M:(x)T[^+2](^)4 _ ^2a^rj.[a+2]^^y _ ^) _ ^rj.[a+2] ^^^2 _ _ ^2a 

= Ml{x)T^^+^\x)^ - x2'^(2T|^"+21(x)2 - 1). 
Applying Step III for each Ai- and A2-element, we derive 

(m:(x)T[^+2]^^)2 _ ^2ayi (M;(x)T[-+2](a;)4 _ x''' {2T^:^^\xf - 1))"' . 
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c^-o-o-p 



c^-o-o-o-p <^-o-o-p 



Fig. 11. "Bad" scenarios for class A2: In an element of class A2 (top), 
there are three scenarios leading to interior arcs in both top and bottom 
having arc-length < 4 (bottom-left), the interior arc only in the top having 
arc- length < 4 (bottom-middle), the interior arc only in the bottom having 
arc- length < 4 (bottom-right). 





Fig. 12. Step IV: a joint structure (left) obtained in Fig. |9]is inflated to a 
new joint structure in (right) by inserting secondary segments (red). 



Step IV: Here we insert additional secondary segments at the remaining (2t + 2h + 
2 — 2ai — 402) positions, see Fig. [121 Formally, this fourth inflation is expressed via the 
combinatorial class 

^■j-[a+2] -J 2t+2h+2-2ai -4a2 

with the generating function (Tlr+2](x))2*+2'^+2-2"i-4a2_ 
Combining Steps I - IV, we arrive at 

X (M:(x)T[-+21(x)^ -x2'^(2T[-+2]^^)2 _ ^^y2 ^rj.[a+2] ^^•^yt+2h+2~2a,-Aa, 

= iT^:^'Kx)nvnvnviriv2r\ 
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where 



— X 



2-(Tr+'](x)2-l) 



Vi 



7] — x"^" 



V2 



V 

_1 + a;2 _ ^ (1 ^ x2-)Ti"+'^ 

r/Tl"+'l(x)2 - x2-(2T["+'](x)2 - 1) 



\2 



1 - + x^" + (-2 + - 3x^")T|r+''(x)^ + (1 + 2x^")T!r+'^(x)^ 

In view of Ti"+'1(0) ^ the inverse [T^ {x)] ^ exists and accordingly rj, rji and 772 are 
welldefined. Since for any 7, 71 G ^(t, /i, Oi, 02) we have H^(x) = H^^(x), we derive 

Hffla;) = ^ H^(x) = ^ ^(t, /i, ai, 02) H^(x). 

7et? {<,'»,aj^,a2) 

7g g(t.h,a-i,a2) 

Using G(x, = h,ai,a2) x^z^u°'^v°'^, we arrive at 

It remains to verify that H„{x) is indeed a power series, which follows from the fact that 
the constant coefficients of 77, rji and 772, regarded as formal power series, are zero. □ 

5. Asymptotic Enumeration 

In this section, we derive simple formulas for the number of joint structures in the limit 
of long sequences. 

Theorem 3. For a > 1, Ho-(x) is algebraic and we have 
(5.1) H„{s) ~ c„s~^ i'^a'^y ^ for some c^, 

where is the minimal, positive real solution of the equation Cl{x , T^a~^'^\x)) = 0, see 
TableUi where 

Q{x,y) = (l-x2 + x2'^)' + 2x2'^(l-x2 + x2-)'(-3 + 3x2-x2'^ + x^^-2x2+2-)y2 
x^" (3 - 6x^ + 3x^ + lOx'^" + Ux^^ + Gx^'' + x^^ 

-14x^+2- + 4x"+2a _ ^4^2+4. ^ 4^4+4. _ 4^2+6.) ^4 
-2x''^ (1 - X^ + 3x2- ^ ^4. _ 2x^+2-) ^6 + a;8-j/8. 

In particular, we have Ci ^ 1.38629 and C2 ~ 3.51610. 
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a 1 


2 


3 


4 


5 


3.30027 


2.18096 


1.82912 


1.65183 


1.54322 



Table 1. Exponential growth rates k^.^ for cr-canonical joint structures 
with arc- length > cr + 2. 

Proof. Set Gi(x) = G(?7, r/, r/i, 772) and L = C{x)[T^a~^'^\x)]. Combining eq. (13. ip in 
Lemma [1] and eq. (14.11) in Theorem [21 we compute 

H.(x) = T^:+'\x)'G{v,v,VuV2) 
= T[-+2l(x)2Gi(x), 
where Gi(x) satisfies the quadratic equation 

(5.2) A{r], r/)Gi(x)2 + B(r/, r/, r/i, V2)Gi{x) + C(r/, r]) = 0. 

Note that A(?7, 77), 6(77, 77, 771, 772) and C{r], rj) are elements of the quadratic field extension 
L/C{x). Thus eq. (15. 2 p implies that Gi(x) is algebraic over L, that is, the field extension 
L[Gi{x)]/L is finite, whence the extension L[Gi{x)]/C{x) is finite. Therefore Gi(a;) is 
algebraic over C(x). Clearly this implies that Ho-(x) is algebraic over C{x) and in partic- 
ular D-finite [20] • Pringsheim's Theorem |2I] guarantees that Ho-(x) has a dominant real 
positive singularity k,^. We verify by explicit computation that for 1 < cr < 5, the singu- 
larity is the unique, minimal, positive real solution of the equation Q(a;, tIT^^^ (x)) = 
and a branch-point singularity of the square root. We list the values of in Table [TJ 
Accordingly, at k„, Ho-(x) coincides with its singular expansion and is given by 

H^(x) = ho + hi{n„ - x)^ + 0((k^ - x)). 
Using Theorem H] and Theorem [5l we arrive at 

1 

Setting Co- = ^^['^1'^ , we compute ci ~ 1.38629 and C2 ~ 3.51610, completing the proof. 

□ 

In Fig. [131 we showcase the quality of the asymptotic formula for cr = 2 and arc-length 
four, implied by Theorem [3] 

6. Discussion 

In this paper we analyzed the biologically relevant class of canonical joint structures 
having arc-length greater than or equal to four. While it is straightforward to derive 
the generating function of joint structures from the (eleven) recursion relations of the 
original rip-grammar (implied by Proposition [1]) [12] the generating function obtained 
this way would be "impossible" to write down. This approach would be neither suitable for 
deriving any asymptotic formulas nor would it allows us to deal with specific stack-length 
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In 



c 2. 18096" 

3/2 




Fig. 13. Exact enumeration versus asymptotic formula. We contrast the 
numbers of 2-canonical joint structures with arc- length > 4 {H2{s)) versus 

3 

c s~2 2.18096'*. For representational purposes we separate the curves via 
setting the constant c = 10^. 



conditions. Therefore we do not use the recurrences imphed by the rip-grammar [12] . 
Instead we build our theory as in [17] around the concept of shapes, which we "color" , in 
order to rule out certain (bad) inflation scenarios. Passing from shapes to refined shapes 
changes the shape-grammar as well as the underlying generating functions. The refined 
shapes are key to the generating functions since the collapsing of stems preserves vital 
information of the interaction structure. It is therefore not surprising that a shape induces 
joint structures via inflation, see Theorem [2J 

As canonical joint structures of arc-length at least four constitute a novel combinatorial 
class it is of interest to compare them with the classes of RNA secondary structures (having 
generating function Q2{s)z'^) and 3-noncrossing pseudoknots structures {J2n Q3i^)^'^)- 
Here a 3-noncrossing structure has a diagram representation in which there are no three 
mutually crossing arcs. Indeed, clearly, RNA secondary structures are joint structures 
without any exterior arcs. Furthermore any joint structure can be interpreted as a par- 
ticular 3-noncrossing structure, by rotating the bottom structure around its endpoint by 
180 degrees, then aligning the two backbones and drawing all exterior arcs in the upper 
halfplane, see Fig. [TH For long sequences the numbers of canonical secondary structures, 
Q2{s) [22], joint structures H2{s) and 3-noncrossing pseudoknots structures Qsi^s), all 
having arc-length at least four [18] we find 

Q2{s) ~ 1.4848 s"t 1.8489" 
H2{s) ~ 3.5161 s-t 2.1801" 
Qsis) ~ 5546 2.5410", 
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Fig. 14. Interpretation of joint structures as 3-noncrossing structures. A 
joint structure (left) can be represented as 3-noncrossing structure (top) 
by rotating the bottom sequence around its endpoint and aligning the two 
backbones and finally drawing all exterior arcs (red) in the upper halfplane. 




10 20 30 40 50 60 70 80 90 100 



Fig. 15. How joint structures "fit" in: we display the numbers of sec- 
ondary structures (red), joint structures (blue) and 3-noncrossing pseudo- 
knots structures (brown). All structure classes are canonical and exhibit 
arc-length greater or equal to four. 

see also Fig. [151 

We can report that joint structures resemble features of secondary structures as well as 
3-noncrossing structures. Indeed, as it is the case for secondary structures, they can be 
MFE-folded in polynomial time and as RNA pseudoknot structures they exhibit crossing 
arcs and are truly shape-based structure class. However, in contrast to 3-noncrossing 
structures, refined shapes have algebraic generating functions (as opposed to D-finite 
ones) and satisfy simple recurrences. 

Let us finally outline future research: with this paper the combinatorics of joint struc- 
tures is completed. The next step is to study their topology, i.e. understanding how joint 
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structures filter via topological genera and boundary components. This program means 
to pass from the deformation retracts studied here to fat-graphs and their associated 
surfaces. 



7.1. Singularity analysis. In light of the fact that explicit formulas for the coefficients 
of a generating function can be very complicated or even impossible to obtain, we esti- 
mate of the coefficients in terms of the exponential factor and the subexponential factor. 
Singularity analysis gives a framework that allows to extract the asymptotics information 
of these coefficients. The key to obtain the asymptotic formulas about the coefficients of 
a generating function is its dominant singularities. The theorem of Pringsheim [231 Elj 
guarantees that a combinatorial generating function with nonnegative coefficients has its 
radius of convergence as its dominant singularity. Furthermore for all our generating func- 
tions it is the unique dominant singularity. The derivation of exponential growth rates 
and subexponential factors from singular expansions of generating functions mainly rely 
on the transfer theorems [23] . 

To be precise, we say a function f{z) is Ap analytic at its dominant singularity z = p, 
if it analytic in some domain Ap(0, r) = {z \ \z\ < r, z ^ p, |Arg(2; — p)\ > 0}, for some 
0, r, where r > \p\ and < < ^. We use the notation 



where c is some constant. Let [z"']f{z) denote the coefficient of z"' in the power series 
expansion of f{z) z = 0. Since the Taylor coefficients have the property 



We can, without loss of generality, reduce our analysis to the case where 2; = 1 is the 
unique dominant singularity. The following theorems transfer the asymptotic expansion 
of a function around its unique dominant singularity to the asymptotic of the function's 
coefficients. 

Theorem 4. [23] Let f{z) be a Ai analytic function at its unique dominant singularity 
z = 1. Let 



7. Appendix 



ifiz) = {g{z)) as z ^ p) 



{f{z)/g{z) cas z ^ p) , 





That is we have in the intersection of a neighborhood of 1 



(7.1) 




for 2 — !■ 1. 



Then we have 



(7.2) 
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Theorem 5. [23] Suppose f{z) = {1 — z) 



a E C \ Z<o, then 



(7.3) 




-) a(a - l)(a - 2)(3a - 1) 
(a-2)(a-3) , ^ ^ M 



1)2(, 




48n3 




7.2. Symbolic Enumeration. Symbolic enumeration [23] plays an important role in the 

computations of generating functions. We first introduce the notion of a combinatorial 
class. Let z = {zi, . . . , za) be a vector of d formal variables and k = {ki, . . . , kd) be a 
vector of integers of the same dimension. We use the simplified notation 

z : =z^^---z/. 

Definition 1. A combinatorial class of d dimension, or simply a class, is an ordered pair 
{A, Wji) where ^ is a finite or denumerable set and a size-function w_4 : A — )■ Z>o satisfies 
that wj^{w) is finite for any n G Z>q. 

Given a class {A,Wj\), the size of an element a G ^ is denoted by w^(a), or simply 
w{a). We consistently denote by An the set of elements in A that have size n and use 
the same group of letters for the cardinality A-^ = \An\- The sequence {An} is called the 
counting sequence of class A. The generating function of a class {A, w^) is given by 



There are two special classes: S and Zi which contain only one element of size and ej, 
respectively. In particular, the generating functions of the classes S and Zi are 

E(z) = 1 and Zj(z) = Zj. 

Next we introduce some basic constructions that constitute the core of a specification lan- 
guage for combinatorial structures. Let A and B be combinatorial classes of d dimension. 



Suppose Ai are combinatorial classes of 1 dimension. We define 

• (^1, A2) := {c = (ai, 02) I ai G Ai} and for c = (ai, 02) G (^1, A2) 

• A + B := AUB, if AnB = and foT ce A + B, 

C Wjxic) if c G ^ 



• A X B := {c = {a,b) \ a e A,b e B} and for c G ^ x i3, 

• Seq(^) := S + A + {Ax A) + {Ax Ax A) ^ . 

Plainly, Seq(^) defines a proper combinatorial class if and only if A contains no element 
of size 0. We immediately observe 



A(z) = ^z"'^('^) = ^ A„z". 
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Proposition 2. Suppose A, B andC are combinatorial classes of d dimension having the 

generating functions A{z) , B(z) andC{z). Let Ai be combinatorial classes ofl dimension 

having the generating functions Ai{z). Then 

(a) C = (Ai, A2,..., Ad) ^ C(z) = Ai{zi) A2{z2) . . . A^iza) 

{h) C = A + B^ C(z) = A(z) + B(z) 

{c)C = AxB^ C(z) = A(z) ■ B(z) 

(d) C = Seq(^) =^ C(z) = 
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